New functions added #13

SANKHA1 · 2024-11-07T17:24:07Z

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

N/A
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Application test
Document test
...

5. New dependencies

New Python dependencies
- Dependency1
- Dependency2
- ...
New Java/Scala dependencies and their license
- Dependency1 and license1
- Dependency2 and license2
- ...

* Fix performance tests regarding trl version * Small fix

* qwen layernorm as input * add group size

) * add transformers_int4_npu_pipeline_win * bugfix * bugfix: wrong actual_output_len * fix format * bugfix & update `README.md`

* add env to disable compile opt * fix style * fix style

* add chatglm fuse mlp

* Add initial support for LNL nightly performance tests * Small fix

* update benchmark readme update new comment with memory usage included * Update README.md

* Limit trl version in example * Limit trl version in example

* support minicpm 1b & qwen 1.5b gw * support minicpm 1b * support minicpm 2b * fix style & error * fix style & update * remove print

* Add dummy model in iGPU perf * Add dummy model in iGPU perf * Fix

* replace gradio_web_server.patch to adjust webui * fix patch problem --------- Co-authored-by: ATMxsp01 <[email protected]>

* llama 3.1/3.2 support compresskv * update * fix transformers 4.45 error * fix style * fix typo * disable llama3.2 1b compresskv

* fix three issues * limit mixed_precision for CW only

* Add initial support for llama3.2-1b/3b * move llama3.2 support into current llama_mp impl

* add minicpm npu * optimize model

* change inter_pp * add comment

* update Readme for FastChat docker demo * update readme * add 'Serving with FastChat' part in docs * polish docs --------- Co-authored-by: ATMxsp01 <[email protected]>

* add ollama troubleshoot en * zh ollama troubleshoot * llamacpp trouble shoot * llamacpp trouble shoot * fix * save gpu memory

* add npu support for baichuan * Update baichuan_mp.py * Update baichuan_mp.py

* add compresskv back for mistral * fix * fix

* Update open webui doc * Resolve comments

* fix npu save * update

* Update baichuan2.py * style fix

…ronments (#12618) * run c-eval on multi-GPUs * Update README.md

* support third party model * simplify code * fix sty;e * fix sym int4 GW * code refactor * fix

* Add GLM4-Edge-V examples * polish readme * revert wrong changes * polish readme * polish readme * little polish in reference info and indent * Small fix and sample output updates * Update main readme --------- Co-authored-by: Yuwen Hu <[email protected]>

Oscilloscope98 and others added 30 commits November 4, 2024 09:42

Fix performance tests regarding trl version (#12319)

94ce447

* Fix performance tests regarding trl version * Small fix

Qwen layernorm as input (#12309)

c8679ad

* qwen layernorm as input * add group size

[NPU pipeline] update cmake usage of pipeline (#12320)

8fe01c9

Perf test further fix regarding trl version (#12321)

4644cb6

Doc: update harness readme (#12324)

a01371f

[NPU L0] Add layernorm weight as const / input setting (#12322)

5ee6f97

Add transformers_int4_npu_pipeline_win in all-in-one benchmark (#12325

e54af44

) * add transformers_int4_npu_pipeline_win * bugfix * bugfix: wrong actual_output_len * fix format * bugfix & update `README.md`

[NPU] Add env to disable compile opt (#12330)

94c4ce3

* add env to disable compile opt * fix style * fix style

Add chatglm2&3 fuse mlp (#12328)

1b637e4

* add chatglm fuse mlp

Add initial support for LNL nightly performance tests (#12326)

522cdf8

* Add initial support for LNL nightly performance tests * Small fix

Small fix to LNL performance tests (#12331)

e2adc97

update benchmark readme (#12323)

45b0d37

* update benchmark readme update new comment with memory usage included * Update README.md

Small fix to LNL performance tests (#12333)

923d696

Limit trl version in example (#12332)

82a61b5

* Limit trl version in example * Limit trl version in example

[NPU] Llama3, Qwen2 1.5b, MiniCPM 1/2B groupwise support (#12327)

d872639

* support minicpm 1b & qwen 1.5b gw * support minicpm 1b * support minicpm 2b * fix style & error * fix style & update * remove print

fix chatglm2 cpu ut (#12336)

8e9a3a1

Add dummy model in iGPU perf (#12341)

7240c28

* Add dummy model in iGPU perf * Add dummy model in iGPU perf * Fix

Replace gradio_web_server.patch to adjust webui (#12329)

899a303

* replace gradio_web_server.patch to adjust webui * fix patch problem --------- Co-authored-by: ATMxsp01 <[email protected]>

[NPU] Hot fix of load_low_bit (#12344)

69e3a56

Add basic glm4v support (#12345)

c8b7265

optimize glm4v's vision part (#12346)

e23ef7d

Add MiniCPM-V-2_6 to arc perf test (#12349)

d984c06

llama 3.1/3.2 support compresskv (#12347)

f24352a

* llama 3.1/3.2 support compresskv * update * fix transformers 4.45 error * fix style * fix typo * disable llama3.2 1b compresskv

fix three NPU benchmark issues (#12350)

c267355

* fix three issues * limit mixed_precision for CW only

Small optimization to glm4 models (#12351)

872a744

[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU (#12339)

a7b6668

* Add initial support for llama3.2-1b/3b * move llama3.2 support into current llama_mp impl

add minicpm-v models to transformers_int4_npu_win api (#12352)

79f2877

* add minicpm npu * optimize model

[NPU] acclib llama3.2 support groupwise (#12355)

d880e53

* change inter_pp * add comment

Update Readme for FastChat docker demo (#12354)

ce0c6ae

* update Readme for FastChat docker demo * update readme * add 'Serving with FastChat' part in docs * polish docs --------- Co-authored-by: ATMxsp01 <[email protected]>

Add troubleshootings for ollama and llama.cpp (#12358)

71ea539

* add ollama troubleshoot en * zh ollama troubleshoot * llamacpp trouble shoot * llamacpp trouble shoot * fix * save gpu memory

lzivan and others added 30 commits December 24, 2024 09:17

[NPU] support asym_int4 for baichuan (#12576)

c410d9c

* add npu support for baichuan * Update baichuan_mp.py * Update baichuan_mp.py

refactor baichuan, glm4 and minicpm3 (#12600)

7aaf02f

refactor mllama, gpt2 and internvl (#12602)

ad2dc96

[NPU] Fix minicpm on MTL (#12599)

45f8f72

refactor mistral and phi3 (#12605)

073f936

refactor chatglm2, internlm, stablelm and qwen (#12604)

4135b89

Update README.zh-CN.md (#12570)

9c9800b

add compresskv back for mistral (#12607)

4e6b9d8

* add compresskv back for mistral * fix * fix

Update README.zh-CN.md (#12610)

54b1d7d

fix llama related import (#12611)

5f5ac8a

rewrite llama optimization (#12609)

6249c1e

[docs] Update doc for latest open webui: 0.4.8 (#12591)

0477fe6

* Update open webui doc * Resolve comments

[NPU] fix npu save (#12614)

9e895f0

* fix npu save * update

remove bigdl-llm test to fix langchain UT (#12613)

a596f1a

Update Dockerfile (#12585)

28737c2

Polish Readme for ModelScope-related examples (#12603)

ef585d3

[NPU] update convert script based on latest usage (#12617)

d841e1d

small fix (#12616)

1604b4e

[NPU] Update prompt format for baichuan2 (#12615)

ccc4055

* Update baichuan2.py * style fix

Consolidated C-Eval Benchmark Guide for Single-GPU and Multi-GPU Envi…

40a7d2b

…ronments (#12618) * run c-eval on multi-GPUs * Update README.md

support passing attn_scale to sdpa (#12619)

a9abde0

[NPU] Compatible with other third-party models like auto-round (#12620)

bbdbbb0

* support third party model * simplify code * fix sty;e * fix sym int4 GW * code refactor * fix

[NPU doc] Update verified platforms (#12621)

796ee57

small fix (#12623)

34dbdb8

NPU] Update prompt format for baichuan2-pipeline (#12625)

5f04ed7

[remove pipeline examples (#12626)

90f6709

[NPU] Fix regression caused by layer_norm change (#12627)

46eeab4

remove unused code again (#12624)

c72a5db

[NPU] Fix save-load usage of minicpm models (#12628)

f17ccfa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New functions added #13

New functions added #13

SANKHA1 commented Nov 7, 2024

New functions added #13

Are you sure you want to change the base?

New functions added #13

Conversation

SANKHA1 commented Nov 7, 2024

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. New dependencies